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Abstract 

Vortex-element  methods  are  often  used  to  efficiently  simulate  incompressible  flows  using  Lagrangian 
techniques.  Use  of  the  FMM  (Fast  Multipole  Method)  allows  considerable  speed  up  of  both  velocity 
evaluation  and  vorticity  evolution  terms  in  these  methods.  Both  equations  require  field  evaluation  of 
constrained  (divergence  free)  vector  valued  quantities  (velocity,  vorticity)  and  cross  terms  from  these. 
These  are  usually  evaluated  by  performing  several  FMM  accelerated  sums  of  scalar  harmonic  functions. 

We  present  a  formulation  of  the  vortex  methods  based  on  the  Lamb-Helmholtz  decomposition  of 
the  velocity  in  terms  of  two  scalar  potentials.  In  its  original  form,  this  decomposition  is  not  invariant 
with  respect  to  translation,  violating  a  key  requirement  for  the  FMM.  One  of  the  key  contributions  of  this 
paper  is  a  theory  for  translation  for  this  representation.  The  translation  theory  is  developed  by  introducing 
“conversion”  operators,  which  enable  the  representation  to  be  restored  in  an  arbitrary  reference  frame. 
Using  this  form,  extremely  efficient  vortex  element  computations  can  be  made,  which  need  evaluation 
of  just  two  scalar  harmonic  FMM  sums  for  evaluating  the  velocity  and  vorticity  evolution  terms.  Details 
of  the  decomposition,  translation  and  conversion  formulae,  and  sample  numerical  results  are  presented. 


1  Introduction 


Vortex  methods  solve  the  incompressible  Navier  Stokes  equation  in  the  velocity-vorticity  form  with  La¬ 
grangian  discretization.  Since  vortex  particles  are  initially  placed  only  in  the  region  of  finite  vorticity  and 
can  convect  along  with  the  flow,  these  methods  provide  an  optimized  spatial  discretization.  Consider  an 
incompressible  flow  generated  by  a  set  of  N  vortex  elements,  characterized  by  coordinates  of  the  centers 
(sources)  x,  and  constant  strength  vector  u>i,i  =  1, N.  Each  element  centered  at  location  x,  produces  an 
elementary  velocity  field  v,;  (y)  according  to  the  Biot-Savart  law,  and  the  total  velocity  field  can  be  computed 
as  a  superposition  of  such  elementary  fields  (e.g.,  see  [1]): 
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In  practice  this  field  needs  to  evaluated  at  M  evaluation  points,  yj5  which  has  ()(M N)  cost.  The  Biot-Savart 
kernel  is  composed  of  a  vector  of  dipole  solutions  of  the  Laplace  equation.  It  is  well  known  that  the  Fast 
Multipole  Method  (FMM)  can  be  used  to  evaluate  such  sums  to  any  specified  accuracy  e  at  a  0(N  +  M) 
reduced  cost  [6]. 

The  vortex  elements  move  with  the  flow.  This  motion  also  causes  an  evolution  of  the  vortex  field 
according  to  the  vortex  evolution  equation.  For  inviscid  flow,  the  evolution  equations  for  the  vortex  positions 
and  strengths  respectively  are 

fix,-  dui  „ 

—rr  =  w;  —  =  s*,  s*  =  ut  •  Vv.  (2) 

dt  dt 

Here  the  right  hand  side  for  the  vortex  strength  is  the  so-called  vortex  stretching  term,  the  evaluation  of 
which  requires  an  evaluation  of  the  gradient  of  the  velocity  vector.  The  evolution  equation  for  the  vorticity 
can  be  modified  to  account  for  liquid  viscosity.  In  the  latter  case  the  elementary  velocity  field  in  Eq.  (1)  can 
be  modified  using  a  smoothing  kernel,  K(\y  —  x,  ;  t), 

Vi  (y;t)  =  X  (y  3X^-*f(|y  -  xj  ;t),  K(r;  t)  =  1  +  0(e),  r  >  a(t),  (3) 

|y  -  Xj| 

which  has  an  effect  only  on  the  near  field,  r  ^  a,  where  a  is  the  radius  of  the  vortex  core,  and  e  <C  1  is 
the  tolerance  for  approximation.  This  modification  does  not  affect  the  far  field,  which  is  computed  with 
the  tolerance  O  (e).  The  computation  of  the  far  field  can  still  be  accelerated  by  the  FMM.  In  the  sequel  we 
do  not  specify  the  core  function  K{r ;  t);  several  choices  including  the  Gaussian  and  polynomial  forms  arc 
discussed  in  the  literature  (see  e.g.,  [15,  19,  20]),  and  there  arc  several  ways  to  speed  up  the  local  summation 
as  well  (e.g.,  [17]).  Extensions  to  compressible  flow  arc  also  possible  [3]. 

The  evolution  equation  is  integrated  using  an  appropriate  time  stepping  scheme.  The  right  hand  side 
of  this  equation,  also  results  in  an  A'-body  computation  for  the  influence  of  particles  in  the  far-held,  with 
somewhat  more  complicated  terms.  As  discussed  above,  the  FMM  for  the  vortex  element  method  is  closely 
related  to  the  scalar  FMM  for  sums  of  multipoles  of  the  Laplace  equation  (“harmonic  FMM”).  In  fact,  it 
is  possible  to  start  with  a  program  written  for  performing  a  harmonic  FMM  and  appropriately  modify  it  to 
create  a  fast  vortex  method  software.  A  key  issue  in  the  implementations  is  the  ratio  of  the  amounts  of  time 
taken  for  computing  a  harmonic  FMM,  to  the  amount  of  time  taken  to  compute  the  vector  sum  for  the  vortex 
velocities,  and  the  vector  sum  for  computing  the  right  hand  side  of  the  vortex  evolution  equations. 

In  [20]  the  computations  for  evaluating  the  stretching  term  is  shown  to  be  about  six  times  the  evalua¬ 
tion  of  the  velocity  alone,  while  in  [14]  the  cost  of  evaluating  the  velocity  is  shown  to  be  a  bit  less  than 
twice  the  cost  of  evaluating  a  single  harmonic  FMM  sum.  This  suggests  that  in  current  implementations 
of  FMM  accelerated  vortex  methods  the  standard  cost  of  evaluating  a  vortex  element  method  computation 
(velocity+stretching)  is  an  order  of  magnitude  larger  than  a  single  harmonic  FMM.  For  a  similar  problem, 
of  evaluating  sums  of  Stokeslets  and  Stresslets  (elementary  solutions  of  Stokes’  equations)  [18]  show  that 
the  evaluation  can  be  done  with  a  cost  of  four  to  six  harmonic  FMM  calls. 

In  this  paper  we  develop  an  extremely  efficient  version  of  the  FMM  for  vortex  element  methods,  which 
achieve  an  evaluation  of  both  the  velocity  and  stretching  term  sums  at  a  cost  of  only  about  two  harmonic 
FMMs.  This  provides  a  substantial  speed  up  of  vortex  element  methods.  Our  basic  tool  is  the  use  of  the 
Lamb-Helmholtz  decomposition  [16]  which  allows  the  representation  of  the  vector  field  in  the  form  of  two 
scalar  potential  fields.  This  form  is  however  not  invariant  to  translation,  and  cannot  be  used  as  is,  within 
the  context  of  an  FMM  summation  algorithm.  We  develop  conversion  operators  that  allow  this  form  to  be 
translated.  In  some  sense  the  method  which  we  develop  in  this  paper  is  similar  to  the  method  of  translation  of 
solutions  of  the  biharmonic  and  Maxwell's  equations  that  we  developed  previously  in  [11,  12]  respectively. 

In  [11]  it  was  shown  that  any  solution  of  the  biharmonic  equation  can  be  expressed  as  a  combination  of 
two  solutions  of  the  Laplace  equations.  However,  when  the  biharmonic  solution  is  expressed  in  this  form. 
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the  translations  cannot  be  done  independently.  Instead  the  two  functions  must  be  translated  jointly,  which 
can  be  handled  relatively  easily  by  the  introduction  of  the  concept  of  a  sparse  “conversion”  operator.  In  fact 
given  a  fast-multipole  method  routine  for  the  Laplace  equation,  we  show  there  that  using  the  conversion 
operators,  it  can  be  employed  as  a  fast-multipole  method  routine  for  the  biharmonic  and  polyharmonic 
equations.  In  [12]  a  translation  theory  for  the  Debye  form  of  the  solution  of  Maxwell’s  equation,  in  terms 
of  two  scalar  potentials  that  satisfy  the  Helmholtz  equation  was  presented.  Here  we  present  an  analogous 
theory  for  the  divergence  constrained  Helmholtz-Lamb  potential  decomposition  of  the  velocity  field,  which 
arc  closely  related  to  vorticity  formulations,  and  show  how  they  can  be  used  in  vortex  element  methods. 

Section  2  of  the  paper  introduces  the  problem  and  notation,  and  shows  that  the  equations  can  be  consid¬ 
ered  to  be  solutions  of  a  divergence  constrained  vector  Laplace  equation.  Section  3  introduces  the  translation 
theory  developed  for  such  equations.  Section  4  shows  how  the  new  translation  theory  along  with  a  program 
for  performing  fast  multipole  method  accelerated  sums  for  multipoles  of  the  Laplace  equation,  can  together 
be  used  to  create  a  FMM  accelerated  vortex  element  solver.  Together  Sections  3  and  4  represent  the  main 
mathematical  results  of  the  paper.  Section  5  presents  the  results  of  numerical  testing,  and  shows  that  using 
the  current  form  of  the  decomposition,  substantial  savings  can  be  made  with  respect  to  state  of  the  art  FMM 
accelerated  vortex  element  techniques.  Section  6  concludes  the  paper. 

2  Statement  of  the  problem 

We  are  given  N  vortex  blobs  of  strength  ut,  i  =  1, ....  A’  located  at  points  x,  and  moving  with  the  flow.  The 
velocity  field  can  be  evaluated  using  either  Eq.  (1)  or  Eq.  (3),  which  both  have  the  same  asymptotic  far- Held 
form.  The  evolution  of  the  vortex  positions  and  the  vortex  strengths  is  given  by  Eqs.  (2).  We  note  now  that 
at  y  /  Xj,  i  =  l. ....  N.  the  velocity  field  v  (y)  satisfies  divergence  constrained  vector  Laplace  equation 
(DCVLE) 

V2v  =  0,  Vv  =  0.  (4) 

In  the  case  when  the  divergence  of  the  field  is  not  constrained,  each  Cartesian  component  of  the  velocity  is 
an  independent  harmonic  function,  and  if  the  FMM  for  the  Laplace  equation  is  available  this  can  be  solved 
at  a  cost  of  execution  of  3  harmonic  FMMs.  The  divergence  constraint,  however,  reduces  the  degree  of 
freedom  for  solutions  by  1,  and,  in  fact,  only  two  harmonic  scalar  potentials,  ([)  and  y,  are  necessary  to 
describe  the  total  field  inside  or  outside  a  sphere  centered  at  the  origin  of  the  reference  frame 

v(r)=V0(r)  +  V  x  (ry(r)) ,  V2</>  =  0,  V2y  =  0.  (5) 

This  decomposition  can  be  treated  as  a  general  Helmholtz  decomposition  of  an  arbitrary  vector  field,  with 
a  particular  form  of  the  vector  potential.  Presumably,  this  form  is  due  to  Lamb  [16],  who  found  a  general 
solution  for  the  Stokes  equations  in  spherical  coordinates,  and  we  refer  to  this  as  the  Lamb-Helmholtz 
decomposition.  Indeed,  Eq.  (4)  can  be  treated  as  the  Stokes  equations  with  zero  pressure,  and  the  Lamb 
solution,  where  the  pressure  is  zeroed  out  provides  the  form  (5). 

The  major  problem  with  the  use  of  this  representation  in  the  FMM  is  providing  a  form  that  is  invariant 
to  translation.  This  can  be  done  by  using  conversion  operators,  as  was  presented  for  the  biharmonic  and 
Maxwell  equations  in  our  previous  publications  [11,  12],  In  this  paper  we  develop  such  operators  and 
present  a  complete  translation  theory  for  the  DCVLE.  We  also  use  this  theory  with  existing  harmonic  FMM 
software  to  compute  both  the  velocity  field  induced  by  N  vortex  elements  (1)  and  for  the  evaluation  of  the 
velocity  gradient  for  an  expense  of  execution  of  only  2  real-valued  harmonic  FMMs.  In  fact,  as  was  shown 
in  [1 1],  is  also  possible  to  use  just  one  FMM  for  a  complex  valued  harmonic  functions  (for  real  solutions  of 
Eq.  (3))  to  solve  the  problem,  which  may  also  have  certain  computational  advantages. 
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It  is  not  difficult  to  show  that  the  DCVLE  appeal's  as  a  general  equation  for  reconstruction  of  an  arbitrary 
vector  field  from  given  curl,  u:  (r),  and  divergence,  q  (r), 

Vxv  =  u,  Vv  =  q.  (6) 


Indeed,  solution  of  these  equations  in  free  space  can  be  written  in  the  form  (e.g.  see  [1]): 

V  (y)  =  ~ Vy  [  - — y -  - — rdv  (x)  +  Vy  x  [  - — y — — -dV  (x) . 

;  yJv  4vr|y-x|  ;  y  Jv  4vr  |y  -  x|  K  J 


(V) 


Subdividing  the  space  to  the  vicinity  of  evaluation  point  y  (near  field)  and  the  domain  outside  this  neighbor¬ 
hood  (far  field)  and  discretizing  the  integrals  for  the  far  field  using  quadratures  with  weights  wt  and  nodes 
Xj,  we  obtain  for  the  far  field  contribution 

v/(y)  =  ^v/*(y)>  vfi  (y)  =  -v  *  |  + v  x  |  | ,  (8) 

y  X-i  y  X-i 

i 

_  Wiq  (xj)  _  WiU  (x^ 

9i  ~  4vr  ’  _  4vr 

Hence,  the  far  field  satisfies  Eq.  (4)  for  which  decomposition  (5)  can  be  used  and  just  an  addition  to 
potential  <f>  due  to  a  given  monopole  distribution  q  (x)  provides  solution  for  a  general  case.  As  mentioned, 
in  the  present  paper  we  do  not  address  computation  of  the  near  field,  which  can  be  done  locally,  e.g.  using 
appropriate  smoothing  kernels.  Note  that  solution  (7)  of  Eq.  (6)  is  unique  up  to  a  gradient  of  a  harmonic 
function  <h,  which  should  be  found  from  the  boundary  conditions.  Such  functions  for  a  given  boundary  can 
be  added  naturally  to  (p  (r)  in  Eq.  (5). 

Equations  (6)  with  q  /  0  appear,  e.g.  in  vortex  methods  for  compressible  flows.  An  example  of  equa¬ 
tions  (for  2D)  can  be  found  in  [3],  which  can  be  appropriately  modified  for  3D.  In  terms  of  computational 
complexity,  besides  the  velocity  field  and  stretching  term  computations,  also  contraction  of  the  velocity  gra¬ 
dient  tensor,  (3  =  Vv  :  Vv  should  be  computed  in  this  case.  This  term  can  be  computed  simultaneously  with 
computation  of  the  vortex  stretching  term.  Thus  the  cost  in  these  extended  cases  should  remain  the  same. 


3  Translation  theory  for  DCVLE 

3.1  Basic  translation  and  differential  operators 

Translation  operator:  A  generic  translation  or  shift  operator  T(t),  where  t  is  a  constant  termed  the  trans¬ 
lation  vector,  acts  on  some  scalar  valued  function  6  (r) ,  to  produce  a  new  function  (p  (r)  (the  translate), 
whose  values  coincide  with  (r)  at  shifted  values  of  the  argument 

</>0)  =  T(t)  [0  (r)]  =  (j)  (r)  =  (j)  (r  +  t) ,  r,t6l3.  (9) 

This  operator  is  linear,  and  if  < j>  (r)  satisfies  the  Laplace  equation,  its  translate  0  (r)  satisfies  the  same  equa¬ 
tion  in  the  shifted  domain. 

Elementary  directional  differential  operators:  We  introduce  the  following  notation  for  differential 
operators  which  appeal'  in  derivations: 

Dr  =  r  •  V,  T>t  =  t  •  V,  XW  =  (r  x  t)  •  V.  (10) 

It  is  not  difficult  to  prove  that  if  <j>  (r)  is  a  harmonic  function  in  some  domain,  then  Vv0  ,  Vtcp,  and  Vrxt(p 
are  also  harmonic  functions  in  the  same  domain.  Note  also  that  operators  V t  and  Vrxt  are  related  to 
infinitesimal  translation  in  direction  of  vector  t  and  infinitesimal  rotation  with  the  rotation  axis  t. 
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3.2  Conversion  operators  for  the  DCVLE 

Consider  now  translation  of  the  vector  function  v  (r)  given  in  form  (5)  using  notation  (9)  for  the  translated 
functions.  We  have 

v  (r)  =  v  (r  +  t)  =V</>  (r  +  t)  +  V  x  ((r  +  t)  x  (r  +  t))  (11) 

=  (r)  +  V  x  ((r  +  t)  x  (r))  =  (r)  +  V  x  (r%  (r))  +  V  x  (t%  (r)) . 

It  is  clear  that  the  obtained  form  of  expansion  of  the  translated  function  does  not  coincide  with  form  (5). 
Our  goal  is  to  find  harmonic  functions  (j>  and  x,  which  provide  a  representation  v  in  the  form  5),  i.e. 


v(r)  =  V0  + V  x  (rx),  (12) 

we  introduce  “conversion”  operators  ,  i,j  =  1,2  : 

<t>  =  Cn4>  +  C\2X,  X  =  C2i4>  +  C22X1  (13) 

which  arc  linear-,  due  to  the  linearity  of  all  transforms  considered. 

Comparing  representations  (11)  and  (12)  we  can  deduce,  that  <f>  (r)  contributes  only  to  (f>  (r) ,  which 
leads  to 

Cu  =  X,  C21  =  0,  (14) 

where  X  is  the  identity  operator.  So,  we  can  introduce  harmonic  functions  cf>'  and  x'  according  to  the 
following  relations 

4>  =  <^+c12x  =  </>  +  </>',  (15) 

X  =  C22X  =  x  +  x'- 

Having  two  representations  of  v,  (1 1)  and  (12),  and  using  Eq.  (15),  we  obtain 

V^'  +  Vx  (rx')  =  V  x  (tx)  •  (16) 

Taking  scalar-  product  with  r  and  noticing  that  r  •  V  x  (rxO  =  0 

r  •  V07  =  r  •  V  x  (tx)  =  r  •  (Vx  x  t)  =  —  (r  x  t)  •  Vx-  (17) 

Another  relation  can  be  obtained  if  we  take  the  curl  of  expression  (16): 

V  x  V  x  (rx')  =  V  x  V  x  (tx)  •  (18) 

It  is  not  difficult  to  check  that  for  harmonic  functions  x'  and  x  the  following  identities  hold: 

V  x  V  x  (rx')  =  V(x'  +  r-Vx'),  (19) 

V  x  V  x  (tx)  =  V  (t  •  Vx)  • 

We  note  further,  that  all  scalar  potentials  are  defined  up  to  a  constant,  so  for  the  space  integration  an  arbitrary 
constant  can  be  added  or  dropped.  Therefore,  we  obtain  from  Eqs  (18)  and  (19): 

x'  +  r  -  Vx'  =  t  •  Vx-  (20) 

Using  notation  (10)  we  can  rewrite  relations  (17)  and  (20)  in  the  form 

7V/>'  =  ~Vrxtx,  (X  +  Vr)  x'  =  Vt%  (21) 
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In  the  next  sections,  we  consider  functions  represented  via  harmonic  expansions,  and  show  that  operators 
Vr  and  (1  +  Vr)  arc  invertible,  in  which  case,  we  can  write  solution  in  the  form 

4>  =  (j>-V '~lVrxtx,  (22) 

x  =  x  +  {T  +  vrylvtx- 

Comparing  this  with  Eq.  (15),  we  obtain  the  following  expressions  for  the  conversion  operators 

C12(t)  =  -V~1Vrxt,  c22(t)  =  l+(l  +  Vr)-lVt.  (23) 

3.3  Expansions  of  harmonic  functions 

In  addition  to  the  Cartesian  coordinates  of  points  and  vectors  we  will  use  spherical  coordinates  (r,  9 ,  cp)  in 
3D: 


r=  (x,  y,  z)  =  r  (sin  9  cos  ip,  sin  9  sin  <p,  cos  9) ,  r  gM3. 


(24) 


For  expansions  of  the  solutions  of  the  Laplace  equation  in  3D  that  are  regular  inside  or  outside  a  sphere 
centered  at  the  origin  of  the  reference  frame  we  introduce  the  regular  or  local  functions  /?"'  (r)  and  singular 
or  multipole  functions  S"1  (r),  that  are  respectively  defined  as 


K  (r) 
STM 


(-1  )n*M 

(n  +  |m|)! 
1 


rnPkn\n)e*mv, 


p  =  cos  9, 

n  =  0,1, 


m  =  — n, n, 


(25) 


where  P™  (/i)  arc  the  associated  Legendre  functions,  defined  by  Rodrigues’  formula 


(-l)m  (l  -  p2)m/2  dm+n 
2  nn\  dpm+n 


Pn  (M)  = 


(/i2  —  l)n  ,  m  ^  0. 


These  functions  arc  related  to  each  other  via 


(26) 


S. T  (r)  =  (-l)n+m  (n  -  m)\(n  +  m)\r-2n~lR™  (r) .  (27) 

The  functions  7?™  (r)  and  S"1  (r)  defined  above  coincide  with  the  normalized  basis  functions  I”1  (r)  and 
O™'  (r)  considered  in  [4]  and  normalized  spherical  basis  functions  in  [11],  The  expansion  of  the  Green’s 
function  in  this  basis  is 

oo  n 

|r  —  E  Rnm(~ ro)STM,  r  >  r0.  (28) 

n= 0  m=—n 

Further,  we  represent  harmonic  functions  in  terms  of  sets  of  expansion  coefficients  over  a  certain  basis 
centered  at  a  given  point,  e.g.  the  local  and  multipole  expansions  centered  at  the  origin  are 

oo  n  oo  n 

=  E  E  WW  *(r)  =  S  Y,  x”Or),  F  =  R,S.  (29) 

n= 0 m=—n  n= 0 m=—n 

Absolute  and  uniform  convergence  of  these  series  in  the  expansion  regions  is  assumed  everywhere  below. 
We  also  extend  the  definition  of  the  basis  functions  for  arbitrary  order  m,  to  shorten  some  expressions 

(r)  =  (r)  =  0,  \m\  >  n,  n  =  0,1,...  (30) 
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3.4  Matrix  representation  of  operators 


Let  £  be  a  linear  operator,  such  that  for  harmonic  function  cp,  ip  =  C<j>  be  also  a  harmonic  function.  Assume 
further  that  both  <p  and  ip  can  be  expanded  in  to  series  of  type  (29).  There  should  be  a  linear  relation 
between  the  expansion  coefficients  4/  =  {ip™}  and  $  =  {<///},  which,  generally  speaking,  will  have  a  form 
'L  =  L*I\  where  L  is  a  matrix,  or  representation  of  £.  Of  course,  for  a  given  £  the  matrix  L  depends  on  the 
bases  over  which  the  expansion  is  taken. 

Let  cp  (r)  be  expanded  over  basis  {l7"l(r)},  while  ip  (r)  be  expanded  over  basis  {G”l(r)}.  Action  of 
operator  £  on  a  basis  function  F"'  (r)  can  be  represented  as 


00  n' 

(31) 

n'= 0  m'=—n' 


£iC(r)  = 


=  V 


Tin  m  /-rm 

L r 


n  =  0,1, 


m  =  — n, ...,  n, 


where  L//'//'  arc  the  reexpansion  coefficients.  It  is  not  difficult  to  show  then  that  the  entries  of  matrix  L  arc 
,  i.e.  L  is  the  matrix  transposed  to  the  matrix  of  reexpansion  coefficients.  Indeed 


£  £  </CG”(r) 

n= 0  m=—n 


00  n' 

V(r)=^(r)=£  £  «'£F„r>'(  r) 

n'= 0  m'=—n' 


(32) 


00  n' 


00  n 


££«'££  Ln™‘G'0r) 

n'=0m'=— n1  n=0m=—n 


£  £ 


00  n 

£  £ 


n=0m=—n  \_n'=0  m'=—n' 


Gm 

n 


(r) 


Reexpansion  coefficients  for  the  translation  operator  7t,  Eq.  (9),  in  the  local  and  multipole  bases  (25)  can 
be  simply  expressed  via  the  respective  basis  functions  (see  [4]  and  [1 1]): 


TtiC(r)  =  E  E  (^iatr(t)C'(r),  F,G  =  S,R,  (33) 

n'= 0  m'=—n' 

(R\R)™:™{t)  =  iC-T'(t),  (S\R)™:™  (t)  =  S™-™'  (t) ,  (5|5);lm(t)  =  F”rf(t). 


To  obtain  representations  of  other  operators  appeared  above,  we  will  use  differential  relations  for  the  basis 
functions,  which  also  can  be  found  in  [4]  and  [1 1]: 


VzRn 

(r) 

-  -iC-iW, 

V,SZ  (r)  = 

~S:\  1  (r) , 

'T')  ~Rrn 

^x+iy-^n 

(r) 

=  «™+'(r), 

Vx+iyS™P r) 

=  iSZtf  (r. 

T) 

Lyx—iyjr*"n 

(r) 

iri 

• 

11 

VX-iySZ{  r) 

=  iS%: t1  (r 

where 


Rx±iy  — 


d_  ,d_ 

dx  dy  ’ 


Vz  = 


d_ 

dz 


(34) 


(35) 


3.4.1  Operator  Vr. 

To  obtain  the  representation  of  this  operator  there  is  no  need  to  consider  differential  relations,  since  accord¬ 
ing  to  definition  (10)  we  have  Vr  =  r  (d/dr)  and  from  Eq.  (25)  we  have 

VrR™(r)=nR™(v),  VrS™  (r)  =  —  (n  +  1)  S™  (r) ,  n  =  0,l,...  (36) 

( R)  ( R) 

This  shows  that  matrices  Dr  and  Dr  representing  this  operator  are  diagonal  and  have  entries 

( Dr  )  =ndmm'6nn /,  ( Df.  )  = -(n  +  l)Smm'Snn>,  n  =  0,1,...  (37) 

\  /  nn'  \  /  nn' 
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where  tirnrn>  is  the  Kroncckcr  symbol. 

Conversion  operators  (23)  contain  inverse  operators  T>~  1  and  (Z  +  Z>r)_1,  which  also  arc  diagonal.  It 

(  Ft} 

may  be  a  cause  for  concern  for  the  inverse  operators  that  zeros  appeal-  on  the  diagonal  of  matrix  Dr  and  on 
the  diagonal  of  matrix  I  +  Dr  at  n  =  0.  However,  these  are  easily  dispensed  with.  For  the  former  case  we 
note  that  harmonic  n  =  0  corresponds  to  a  constant  basis  function  R{ j  (r) .  Eq.  (22)  shows  that  this  affects 
only  the  constant  added  to  potential  q i,  which  obviously  does  not  affect  the  velocity  field  (5),  and  can  be  set 
to  an  arbitrary  value,  e.g.  to  zero.  For  the  latter  case,  Eq.  (11)  shows  that  harmonic  n  =  0  in  multipole 
expansion  of  function  x  also  does  not  affect  the  velocity  field,  since 

V  x  (rr^1)  =  — r  x  V  (r-1)  =  r~2 r  x  r  =  0.  (38) 

Since  operator  ^1  +  D(v  j  is  needed  only  to  determine  converted  function  %  (see  Eq.  (22)  this  also 
can  be  set  to  zero.  In  other  words,  for  the  purpose  of  computation  of  the  conversion  operators  the  inverse 
operators  for  singular  matrices  can  be  defined  as  follows 

n-1,  n  >  0 
0,  n  =  0 

(39) 


D 


(R) 


-1 


Om.m./  3 


mm 


n  1,  n  >  0 
0,  n  =  0 


I  +  d[S) 


-i 


tim  in  '  Or, 


mm'  wnn' 


3.4.2  Operator  Vt. 

From  definitions  (10)  and  (35)  we  have 
0  Q  Q  1 

Xt  —  tx  T  4"  —  9  [(^  ^y)  'Dx+iy  T  {tx  T  Xy )R x—iy\  T  tzT)z 

Using  Eq.  (34)  we  obtain 


(40) 


DtK  =  +  (41) 

as™  =  i[(tv  +  itI))C+11-(t„-itI)s™-i1]-t,s”1. 

(Ft}  ( 

Taking  into  account  that  matrices  Dj;  and  7  are  transposed  to  the  reexpansion  matrices,  we  obtain 


D(R)^ 

D 


t  1  , 

(S)^  rnrn' 


3n+l,n' 
=  3n—l,n' 


2  (^y  'itx)  ^m—  I,??!'  2  (^2/  ^x)^m-\-l,m'  tzfimm' 

—  (ty  +  itx)  &m—l,m'  ~  o(ty~  itx  ,m'  tz^mm' 


(42) 


2  1  2' 

This  also  can  be  rewritten  in  terms  of  the  expansion  coefficients  of  functions  ^  and  </>,  ^  =  Vt<f !>, 

C  =  E(D is|)“‘ =  1  [(t, + a.) c  - ((. - (‘.JC1]  - ‘.*1.  (44) 

z \  /  nn'  Z 

n,m 

c  =  E  (DiS)) «'  =  5  [<*!,+»*„)  ex  -  (ty  -  itoc-i1]  -  we., ■ 

\  /  nn'  Z 


3.4.3  Operator  Prxt- 

From  definitions  (10)  and  (35)  we  have 

d  d  d 

£>rxt  =  ( ytz  -  zty )  —  +  (ztx  -  xtz)  Q-  +  ( xty  -  ytx)  — 


e±  = 


tx  T  Hy) 
x  ±iy 


Z-Vz  -  ^ ZVX-iy 


4-  ‘iij'x  tty) 


dz 

£,+Rz  ~  ^Z^x+iy 


(44) 

4-  itz  \^—RX-\-iy  £-f -'Dx—iy]  , 


Consider  first  action  of  this  operator  on  basis  functions  i?™  (r).  The  following  relations  derived  in  [11]  arc 
useful  in  this  case: 


t-K  =  -in-™  +  2K+l-l-zRn 


2  n 

>m— 1 


(45) 


We  have  then,  using  these  relations  and  Eq.  (34) 


^,-Rz  -  -ZVX-iy 

ti+Rz  ~  2  zT^x+iy 


k  =  -e-jff-i  -  -jzr™ii  =  r  ":-ri 


1—1  _  .n-m  +  l  1 


(46) 


nra  _  _t  pin  _  pm+1  _  „-n  +  m  4~  1  pm+l 
s+-rtn— 1  2  i  ;  2  r^n  ’ 


[£-P*+iy-efPx-iy]iC  =  iC-^C-11  -  ^K-11  =  -miC- ■ 

Now  we  obtain  from  Eq.  (44) 

VrxtR ™  =  ( tx  +  ity)n~™  +  lR™~ 1  -  (4  -  Zty)W  +  ^  +  1iC+1  -  itzfnRr, 

To  get  a  similar  relation  for  basis  functions  S ™  (r)  we  can  use  relation  (27).  Using  identity  (r  x  t)  • 
V  [/  (r)  g  (r)]  =  /  (r)  (r  x  t)  •  X7g  (r)  and  Eq.  (47),  we  obtain 

(48) 


->m 
tn  . 


(47) 


T>rxtS?  =  (— l)n+m  (n  —  m)\(n  +  m)\r 


r>m 


=  —(fa;  +  1  +  (tx  -  S^+1  -  itzmS, 


<m 
n  • 


(48): 

D 

D 


Expressions  for  the  representing  matrices  for  the  local  and  multipole  bases  follow  from  Eqs  (47)  and 


,(*)' 

mm 

rxt 

)  , 

j 

nn' 

,(S)N 

.  mm' 

rxt 

/ 

'  rm' 

j.  L  ,n'  -  m!  +  1  n'  +  m'  +  l  ,  , 

—  Onn'  V’X  T  ity)  ^  0m 4-l)TO'  ~  (tx  ~  tty)  ^  *m—l,m'  ~  ltz7Tl  0mrr{^y) 

X  \  U  >  ■+  \n' +  m'  X  U  ■+  \n'  ~  m'  X  •+  X 

—  Onn'  (tx  T  tty)  —  Um+l,m'  T  (tx  tty j  —  ltzUl  Omrn/ 

This  also  can  be  rewritten  in  terms  of  the  expansion  coefficients  of  functions  and  d,  ip  =  'Drxt4>i 

c  =  =  <5o> 


**  =  E  (DS)t)”T’  «'  =  -(*.  +  it,)n  +  ”  +  1C+1  +  (4  -  ity)-  ”  +  1g~‘  -  I4™#>: 
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3.4.4  Conversion  operators 


It  is  not  difficult  to  obtain  matrix  representations  for  the  conversion  operator  from  Eqs  (14)  and  (23)  and 
expressions  for  the  differential  operators  derived  above.  A  more  compact  form  relating  the  expansion  coef¬ 
ficients  of  functions  in  Eq.  (22)  for  the  /(-expansions  is 


bm 


=  c-- 


n 


Xn  =  Xn  + 


/  ,  77  —  777  ii  ,  ,  77  ~b  777  _ i  . 

( tx  +  ity)  2  Xn  ~(tx~  ity)  2  Xn  -  ltzmx. 

^  \\(ty  +  -  \(t,  -  itxJSi1  -  t,S+ 1 


,  (^8  =  0g)51) 


Si  mil  aiiy,  for  the  5-expansions  we  have 


=  c+ 


1 


n  +  1 


,  .  .  n  +  m  +  1  ™  ,  .  .  .  n  —  m  +  _ 

-(/x  +  tty) - - - Xn  +  {tx  -  tty) - - - Xn  -  ltzmxr 


Xn  Xr 


1 

n 


2 

1 


(52) 


2  (*1/  +  ttx) X’n—  1  -  2  ( ty  -  tt^Xn-l  -  tz5C-l 


(x°  =  Xo)  • 


3.5  RCR-decomposition 

In  many  FMM  codes  translations  arc  performed  using  the  rotation-coaxial  translation-back  rotation  (RCR) 
decomposition  of  the  translation  operators  ,  which  reduces  translation  cost  of  expansions  of  length  p 2  to 
0(p3),  opposed  to  ()(p4)  required  for  the  direct  application  of  the  translation  matrix  (e.g.  see  [11]).  Such 
a  decomposition  is  also  beneficial  for  faster  conversion,  since  the  rotations  do  not  change  the  form  of  de¬ 
composition  of  the  vector  field  (5)  and  there  is  no  need  to  rotate  cj)  and  \  in  conversion  operators.  Coaxial 
translation  means  translation  along  the  ^-direction  to  distance  t,  in  which  case  expressions  for  the  conversion 
operators  (51)  and  (52)  become  even  simpler  (t  =  tiz,  tx  =  ty  =  0,  tz  =  t ): 

/(-conversion  :  (C  =  (C  +  ^Xn,  (</>o  =  ^o)  ,  Xn  =  Xn  ~  ^yXn+i>  (53) 

5-conversion  :  C  =  C  -  it^~[Xn  ,  Xn=Xn+^Xn-i,  (xo  =  Xo)  • 


3.6  Other  harmonic  bases 

In  the  case  of  the  use  of  other  than  {R™}  and  {  5"' }  bases,  i.e.  { /(']"  }  and  { S'™  } ,  matrices  representing  the 
operators  should  be  modified  accordingly.  Generally  speaking,  we  can  introduce  basis  reexpansion  matrices, 
B  //  !  and  B1'  3'-1,  for  the  local  and  multipole  bases,  so 


Km(r)  = 


oo  n 

-E  E 

n'= 0  m'=—n' 


b(f) 


F™  (r) ,  F  =  S,R. 


(54) 


Which  results  in  conversion  of  expansion  coefficients  in  basis  F  to  expansion  coefficients  in  basis  F’  with 
matrix  B^f),  C'  =  B(i?)C.  Assume  now  that  L  is  the  matrix  representing  operator  C  in  basis  F,  for  which 
we  have  expressions  derived  above.  Matrix  L/,  representing  the  same  operator  in  basis  F’  then  will  be 
L/  =  B  1  L  |  B1 1  ,  where  invertibility  is  provided  by  the  fact  that  both  systems  { F'[n }  and  { F"1 }  arc 

complete  (bases).  In  the  case  of  complex  harmonic  bases,  basis  transform  matrices  arc  usually  diagonal,  i.e. 


B(i* ']  =  5nn’Smm'B^F)m,  n  =  0,1, ...,  m  =  -n, ...,  n. 


(55) 
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In  this  case  the  above  relations  can  be  easily  modified.  For  example,  conversion  relations  for  coaxial  trans¬ 
lations  (53)  for  coefficients  in  bases  R'  and  S'  become 


/(-conversion 


5-conversion 


<C  = 
4fr  = 


'e+it-xT, 


n 


A'O 


~/m  =  wm 
An  in 


t  B 


( R)m 


n  +  1  B 


^n+l56) 


n+1 


—  it- 


m 


n  +  1 


yfm 

An  j 


t  /-? 

~/m  _  C;fm  j _ _  ■ 

Xn  An  r 


(5)m 


n  B 


(S)m 


xT-u  (x?  =  x?). 


n—  1 


This  also  can  be  checked  by  substitution  (fJ™  =  B^m<\>™  for  all  functions  with  hats  and  tildes  (similarly 
for  x)  in  Eq.  (53). 


4  Fast  multipole  method 

There  is  an  extensive  literature  on  the  FMM  for  the  3D  Laplace  equation  and  several  implementations  have 
been  descriped.  Because  of  this,  we  do  not  see  any  need  to  describe  this  algorithm  again,  and  refer  the  reader 
to  papers,  which  in  details  discuss  the  method  [6,  2,  7,  10,  13]. 

4.1  Mapping  of  a  real  vector  field  to  a  complex  harmonic  function 

It  is  proposed  in  [11]  to  modify  an  available  FMM  routine  for  complex  valued  harmonic  function  to  an 
FMM  routine  which  provides  the  FMM  for  real  valued  biharmonic  functions.  So  just  one  complex  FMM 
can  be  executed  instead  of  two  FMMs  for  the  real  functions.  Our  tests  show  that  such  an  approach  provides 
a  small  advantage  compared  to  the  FMMs  for  real  harmonic  functions  (which  modification  requires  a  bit 
more  analytical  work,  to  convert  matrix  representations  of  the  conversion  operators  of  the  present  paper 
given  in  complex  bases  into  respective  expressions  in  the  real  bases,  e.g.  used  in  GPU  implementation  [13]). 
Anyway,  this  method  can  be  taken  and  applied  directly  to  the  present  case,  since  a  complex  valued  harmonic 
function  T  (r)  can  be  composed  from  two  real  functions  cj)  (r)  and  x  (r),  as 

^  0)  =  <t>  (r)  +  ix  (r)  •  (57) 

Further  the  translation  algorithm  for  T  (r)  will  be  exactly  the  same  as  for  the  biharmonic  functions,  de¬ 
scribed  in  [11],  with  the  only  difference  in  the  conversion  operators,  where  relations  (53)  should  be  used 
instead  of  conversion  relations  for  the  biharmonic  functions. 

4.2  Expansion  of  elementary  velocities 

Consider  multipole  expansion  of  v;  (y)  given  by  Eq.  (1)  about  the  center,  x*,of  a  source  box  containing  x^. 
Using  Eq.  (28),  we  obtain 

oo  n 

V,(y)  =  Vxj7^7|=E  E  CFtoW,  F£(r)=VxhS™(r)], 

n=0 m=—n 

(Jin  =  Rnm(~rlX  r  =  y-x*,  r/  =  x;-x*  (58) 

Note  then  that  is  equivalent  to  the  right  hand  side  of  Eq.  (16),  where  one  should  set  t  =  ui,  x  =  5"'(r')- 
So  this  function  can  be  represented  in  the  form  provided  by  the  left  hand  side  of  Eq.  (16),  where  functions 
(f)'  and  x!  can  be  found  from  Eqs  (21)  and  (23)  i.e. 

F£(r)  =  VxW(  r)]  =  V$rB(r)  +  Vx(r^(r)),  (59) 

<^(r)  =  C12(Wl)Sn  r),  X%(r)  =  (C22(ul)-l)S™(r). 
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Substituting  this  into  Eq.  (58)  and  using  representation  of  the  conversion  operators  in  the  space  of  expansion 
coefficients,  Eq.  (52),  we  obtain 


v«  (y) 
(, k  (r) 


V<&( r)  +  Vx  (rxi  (r)) , 

oo  n  oo  n 

E  E  9l”Cl2("l)ST(r)  =  E  E  «CM, 


n=0  m=—n 
oo  n 


n— 0  m=—n 

oo  n 


X/  (r)  =  E  E  r)  =  E  E  xEW 


where 


n=0  m=—n 


n= 0  m=—n 


C  =  E[Ci?(“') 


,  Sir, 


n,m 

i 


71+1 


.  .  ,  n  —  777.  +  1  m_i  ,  ,  77  +  m  +  1  m_|_i  .  m 

y^ix  —  t'bJiy)  —  5)n  —  (uix  +  iwiy)  —  gln  —  i^iz'm9in 


Xln  = 


E  [c2?  m  - 1 


1 

n 


-  iuix)g™n t\  -  ^Pzmn_! 


,  (X?o  =  0) 


(60) 


(61) 


In  the  FMM  coefficients  cjj^  arc  generated  routinely  (as  coefficients  of  expansion  of  monopoles).  So  a  small 
modification  of  an  available  routine  is  needed  to  obtain  <j>^  and  x”rl  l°r  a  given  vector  u+ 

It  is  also  not  difficult  to  show  that  harmonic  functions 


=  r  ■  V^z  =  Vr<t>u  mi  =  xi  +  r  •  Vxz  =  {X  +  VT)xh  (62) 

which  spectra  arc  simply  related  to  scalar  potentials  for  source  l,  arc  nothing  but  dipoles,  and  so  they  can  be 
computed  using  standard  subroutines  for  dipole  expansions.  Indeed,  taking  scalar  product  of  both  sides  of 
equation 

V^z  (r)  +  V  x  (rXz  (r))  =  V  x  (63) 

lr  “ 

with  r,  we  can  see  that 

1^1  =  r  •  V  x  Ul  =  (r t  xu>i)-(  * — ,  (64) 

lr  —  rd  V|r-rz|V 

i.e.  ij) i  is  a  dipole  with  moment  p/  =  r/  x  uj.  Taking  curl  of  both  sides  of  Eq.  (63),  we  obtain,  using  identity 
VxVx=V(V-)  —  V2,  and  V2  (r xi)  =  2  Vxz,  which  is  valid  for  any  harmonic  function  x 

V  [V  •  (r xi  (r))  -  2Xi  (r)]  =  V  (V  •  ■  (65) 

So,  up  to  the  integration  constant 

^z  =  V-  [r  xi  (r)]  -  2xi  (r)  =  V  •  -  — 1  =  Ul  —  (66) 

lr  —  rd  \r-ri\6 

i.e.  mi  is  a  dipole  with  moment  pz  =  — c oi. 

Note  that  for  reconstruction  of  the  vector  field  from  the  curl  and  divergence,  expansion  of  a  monopole 
of  intensity  —  qi  should  be  added  to  +  (see  Eq.  (8)). 
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4.3  Evaluation  of  the  vector  field 


Other  procedures,  which  can  be  accelerated  arc  related  to  the  final  step  of  the  FMM  for  the  dense  matrix- 
vector  product.  As  a  result  of  the  FMM  algorithm  /(-expansions  of  scalar  potentials  are  obtained  about  the 
center  y*  ,of  an  evaluation  box  containing  receiver  point  y3 

oo  n  oo  n 

<AW  =  E  E  crew,  xW  =  E  E  xXlr),  r  =  y-y„.  m 

n=0  m=—n  n=0  m=—n 

Cartesian  components  of  the  velocity  then  can  be  obtained  by  projection  of  Eq.  (5)  to  the  basis  vectors  i7: ,  iy, 
and  L  as  follows 


Vk  =  ifc  •  v  =  ifc  •  V0  +  ifc  •  V  x  (ry)  =  +  Drxifcx,  k  =  x,y,z.  (68) 

since  i^Vx  (ry)  =  i^.  •  (Vy  x  r)  =  (r  x  ij.)  •  Vy.  Using  representations  of  the  above  operators  in  the 
space  of  expansion  coefficients  (43)  and  (50),  where  t  =  i&,  we  determine 

oo  n 

Vk  =  V™nRn(r),  k  =  X,y,Z,  (69) 

n=0  m=—n 

<C  =  \  [i<Pn+!+i&  +  (n-m)xZ+1  ~(n  +  m) x™"']  ■ 

v?„  =  -c+i-^xr 

Furthermore,  consider  computation  of  the  vortex  stretching  at  evaluation  point  y j  (r j  =  y?— y.J,  as¬ 
suming  that  the  vorticity  vector  at  this  point  is  us3.  The  stretching  is  a  vector 


si  = 


Oj  •  V)  v  (r) |r=r .  =  Vu.v  (r)|r=r  =J^ik  V^.vk  (r)|p=. 

k 


(70) 


Flence,  the  Cartesian  components  of  this  vector  can  be  obtained  simply  from  computed  coefficients  v™ ,  Eq. 

(  Ft} 

(69),  to  which  sparse  operator  should  be  applied  (see  Eq.  (43)): 


Sjk 


S 


m 

jkn 


oo  n 


E  E  sjlnK(rj), 


n= 0  m=—n 

1 


2  L 


(iOjy  +  ilOjx)  vk  n+1 


k  =  x,y,z, 

{tUjy  —  iujjx)v™++ 1 


U}jzvk,n+ 1- 


(71) 


This  shows  that  computation  of  this  term  can  be  also  performed  efficiently,  and  the  data  on  expansion 
coefficients  of  (j)  and  y  can  be  easiliy  converted  to  the  parameters  required  for  computation  of  a  vortical 
flow.  In  practice,  it  can  be  more  efficient  to  compute  expansion  coefficients  u'^n  for  functions  uik  =  U;(  v\- , 
L  k  =  1,2,3,  which  do  not  depend  on  the  evaluation  point  j  and  then  form  appropriate  coefficients  sjkn  for 
each  point  as 


3 

777  '  777 

sjkn  =  /  ^ jlulkn • 
1=1 


(72) 


Note  that  uik  arc  components  of  tensor  Vv.  So  as  these  quantities  arc  available,  contraction  [3  =  Vv  :Vv 
can  be  computed  (this  is  needed  for  compressible  flows,  see  a  remark  at  the  end  of  section  “Statement  of  the 
Problem”).  Also,  computation  of  Vv  provides  the  strain  tensor,  which  can  be  used  for  modeling  of  complex 
fluids. 
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5  Numerical  tests 


For  the  numerical  tests  we  used  basic  FMM  code  with  the  RCR-decomposition  of  translation  operators 
modified  for  two  harmonic  functions.  Conversion  operators  in  the  R-  or  S-  basis  were  executed  after  coaxial 
translation  operators,  as  described  in  [11].  Additional  small  modifications  were  used  in  the  algorithm  to 
compute  R-basis  functions  for  real  harmonic  functions  recursively,  as  presented  in  [13].  In  contrast  to  [13] 
no  optimizations  of  the  algorithm  were  used  (no  GPU,  standard  189  M2L  translation  stencils,  no  variable 
truncation  number,  etc.),  as  the  main  purpose  of  this  paper  was  to  provide  a  basic  comparative  performance 
and  accuracy  test  of  the  method.  Nonetheless,  to  speedup  the  tests  Open  MP  parallelization  was  used,  which 
for  4  core  PC  provided  parallelization  efficiency  close  to  100  percent.  Wall  clock  times  reported  below  were 
measured  for  Intel  QX6780  (2.8  GHz)  4  core  PC  with  8GB  RAM. 

5.1  Error  tests 

The  first  test  we  conducted  is  related  to  the  numerical  errors  of  computation  of  the  velocity  and  stretching 
term.  Also  for  comparisons  we  executed  the  FMM  for  a  single  harmonic  function  and  measured  numerical 
errors  in  potential  and  its  gradient.  There  arc  two  basic  sources  of  the  errors:  first,  the  FMM  errors,  due  to 
truncation  of  the  infinite  series.  These  errors  arc  controlled  by  the  truncation  number  p  (so  infinite  series 
(29)  were  replaced  by  the  first  p 2  terms,  n  =  0,  ..../>  —  1;  m  =  —n, ...,  n),  which  we  varied  in  the  tests. 
Second,  the  roundoff  errors,  which  in  our  computations  with  double  precision  in  the  range  of  tested  p  were 
smaller  than  the  truncation  errors  (the  roundoff  errors  were  observed  for  potential  computations  at  p>  25). 
The  basic  test  was  performed  for  N  sources/receivers  distributed  randomly  and  uniformly  inside  a  cube. 
The  error,  C2,  was  measured  in  the  L2  relative  norm  based  on  1000  points  randomly  selected  from  the  source 
set  (our  previous  tests  with  the  number  of  reference  points  using  direct  computations  up  to  100,000  points 
show  that  even  100  points  provide  sufficient  confidence  for  the  L2-norm  error,  see  [11]).  For  the  reference 
solution  the  velocity  field,  stretching  term,  potential  and  gradient  were  computed  directly. 

Figure  1  illustrates  behavior  of  the  computed  errors  for  the  velocity  and  stretching  term.  For  reference 
dependences  of  the  respective  errors  in  the  harmonic  potential  and  in  its  gradient  arc  shown.  It  is  seen  that 
starting  with  p  ~  7  spectral  convergence  is  observed  for  all  cases.  It  is  also  noticeable,  that  the  errors  in 
potential  computations  arc  substantially  smaller  than  that  for  the  gradient  or  higher  derivative  computations. 
There  arc  two  basic  reasons  why  this  happens.  First,  the  effective  truncation  number  for  each  derivative 
is  smaller  by  one  compared  to  the  potential,  and  second  that  in  the  truncated  term  for  the  derivative  an 
additional  factor  ~  p  appeal's  (indeed,  one  can  imagine  series  over  monomials  xn,  which  being  truncated 
to  the  first  p  terms  provide  error  0  {xp)\  differentiation  of  this  series  provides  nxn~l  monomials  and  the 
error  bound  O  (pxp~  1 )  for  the  same  x  and  p)  (we  checked  also  this  numerically  and  found  consistency  of 
the  orders  of  the  measured  errors  with  this  explanation).  In  any  case  the  difference  in  the  errors  between 
the  gradient  computations  and  those  for  the  velocity/stretching  is  not  very  large  and  some  increase  of  the 
truncation  number  can  compensate  such  a  difference. 

5.2  Performance  tests 

Peculiarity  of  the  FMM  is  that  for  different  problems  different  depth  of  the  octrees,  Zmax,  should  be  used 
to  minimize  the  total  execution  time.  So  for  all  reported  test  cases  we  conducted  optimization  study  and 
profiling  of  the  algorithm.  Some  results  of  profiling  (wall  clock  time  in  seconds)  for  optimized  benchmark 
cases  with  random  uniform  distributions  of  sources  inside  a  cube  and  on  the  surface  of  a  sphere  are  provided 
in  Tables  1  and  2.  In  these  tables  v  and  s  mark  computations  of  the  velocity  and  stretching  term  for  vortical 
flows,  while  0  and  Vd>  refer  to  a  reference  case  for  the  scalar  Laplace  equation,  where  the  potential  or 
both  potential  and  its  gradient  should  be  computed.  The  total  initialization  time,  which  includes  setting  of 
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Figure  1:  Dependences  of  the  relative  FMM  errors  in  the  L->  norm  on  the  truncation  number,  p.  Errors 
were  computed  over  1000  random  points  for  N  =  220  sources  of  random  intensity  distributed  uniformly 
randomly  inside  a  cube.  The  maximum  level  of  space  subdivision  Zmax  =  5. 


the  data  structure  and  precomputations  related  to  translation  operators  and  can  be  amortized  for  a  constant 
source/receiver  set  but  for  different  input  vectors  is  reported  separately  from  the  total  run  time.  As  one  can 
see  this  time  is  relatively  small,  while  for  dynamic  problems  it  should  be  added  to  the  total  run  time.  Tables 
also  separate  the  time  for  sparse  and  dense  matrix-vector  products,  produced  by  the  FMM.  The  latter  is  also 
expanded  to  show  timing  of  the  basic  FMM  stages,  which  include  generation  of  the  multipole  expansions, 
upward  and  downward  passes  involving  translations,  and  evaluation  of  the  local  expansions.  Truncation 
number  for  all  cases  was  p  =  12,  which  provides  errors  eg  ~  10-5  for  the  velocity  and  stretching  term 
computations,  while  smaller  errors  for  cp  and  Vet  (see  Fig.  1). 


Table  1:  Profiling  of  the  FMM  for  random  uniform  distribution  of  sources  inside  a  cube,  p  =  12. 


Case 

Zmax  Total  Init 

S-expansion 

Upward 

Downward 

R-evaluation 

Sparse  MV 

Total  Run 

N  =  2iy 

v  and  s 

4 

0.55 

0.55 

0.04 

2.65 

0.52 

25.9 

29.7 

v  alone 

4 

0.55 

0.55 

0.04 

2.65 

0.34 

16.4 

20.0 

<j>  and  V  (p 

5 

1.20 

0.20 

0.12 

10.7 

0.36 

3.95 

15.3 

cp  alone 

5 

1.20 

0.20 

0.12 

10.7 

0.21 

1.89 

13.1 

N  =  22U 

v  and  s 

5 

1.71 

1.05 

0.27 

25.3 

1.11 

14.2 

41.9 

v  alone 

5 

1.71 

1.05 

0.27 

25.3 

0.59 

9.04 

36.3 

<j>  and  V  <f> 

5 

1.71 

0.39 

0.12 

10.7 

0.71 

15.4 

27.3 

<fi  alone 

5 

1.71 

0.39 

0.12 

10.7 

0.43 

7.32 

19.0 

The  tables  show  that  in  the  cases  when  the  number  of  translations  for  single  potential  cp  for  the  scalar 
Laplace  equations  and  coupled  potentials  o  and  x  for  the  DCVLE  is  the  same  (the  same  Zmax)  the  translation 
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Table  2:  Profiling  of  the  FMM  for  random  uniform  distribution  of  sources  on  a  sphere  surface,  p  =  12. 


Case  x  Total  Init 

S-expansion 

Upward 

Downward 

R-evaluation  Sparse  MV 

Total  Run 

N  =  2iy 

v  and  s 

7 

1.23 

0.88 

0.34 

9.48 

0.66 

3.75 

15.1 

v  alone 

6 

0.66 

0.63 

0.10 

2.46 

0.34 

8.66 

12.2 

<j>  and  V  <f> 

7 

1.23 

0.23 

0.15 

3.93 

0.41 

3.90 

8.62 

<fi  alone 

7 

1.23 

0.23 

0.15 

3.93 

0.26 

1.86 

6.43 

N  = 

v  and  s 

7 

1.81 

1.29 

0.34 

9.48 

1.30 

14.1 

26.5 

v  alone 

7 

1.81 

1.29 

0.34 

9.48 

0.70 

8.86 

20.7 

<j>  and  V  4> 

7 

1.81 

0.46 

0.15 

4.14 

0.84 

15.3 

20.9 

<fi  alone 

7 

1.81 

0.46 

0.15 

4.14 

0.52 

7.30 

12.6 

time  for  the  latter  case  approximately  2  times  larger  that  is  an  expected  result.  The  translation  time  ratio, 
in  fact,  is  slightly  larger  than  2,  which  can  be  explained  by  two  factors.  First,  this  is  due  to  increase  in 
the  size  of  the  arrays  representing  expansions  and  more  time  needed  for  data  access,  and,  second,  by  the 
presence  of  the  conversion  operators.  The  tables  also  show  that  the  time  for  sparse  matrix- vector  products  for 
velocity  only  computations  in  DCVLE  is  slightly  larger  than  for  potential  only  computations  in  a  harmonic 
FMM,  while  the  time  for  the  same  operations  for  velocity  and  stretching  computations  arc  slightly  smaller 
than  for  potential  and  gradient  computations.  Note,  however,  that  if  an  additional  near- field  kernel  should 
be  computed,  which  may  involve  computation  of  special  functions  (exponents,  error  integrals,  etc.)  the 
time  for  the  sparse  matrix-vector  product  would  increase,  while  the  translation  paid  would  not  be  affected. 
Also  note  that,  theoretically,  in  the  optimized  algorithm  increase  of  the  complexity  of  the  sparse  matrix- 
vector  product  k  times  affects  the  total  complexity  as  y/k.  The  ratio  of  the  total  time  for  the  velocity  and 
stretching  computations  to  the  time  of  potential  and  gradient  computations  depends  on  the  problem  size  and 
source/receiver  distributions,  but  in  all  our  numerical  experiments  this  ratio  never  exceeded  2  (except  of 
one  outlier  at  N  =  1024,  see  Fig.  3),  while  in  average  it  was  about  1.5.  Finally,  for  the  same  maximum 
level  of  space  subdivision,  we  can  see  that  both  most  expensive  parts  contributing  to  the  total  complexity, 
translations  and  sparse  matrix-vector  product,  for  the  velocity  and  stretching  computations  approximately 
2  times  more  expensive  than  that  for  a  single  harmonic  potential.  So,  a  factor  2  is  expected  for  this  ratio. 
Slightly  larger  ratios  were  observed  in  the  numerical  tests  due  to  other  parts  of  the  algorithm  and  overheads 
mentioned  above. 

Figure  2  illustrates  dependence  of  the  wall  clock  time  on  the  number  of  sources  N,  which  in  ah  cases 
was  set  to  be  equal  to  the  number  of  receivers.  This  is  the  case  for  uniform  random  distributions  inside  the 
cube.  It  is  seen  that  at  large  N  the  algorithm  scales  linearly,  and  the  time  for  velocity  and  stretching  term 
computations  is  always  larger  than  that  for  scalar  potential  only  computations  approximately  two  times. 

Figure  3  illustrates  the  wall  clock  FMM  run  time  ratio  of  velocity  and  stretching  to  potential  and  gradient 
computations  for  different  p  and  N  =  2k,  k  =  10, ...,  20.  It  is  seen  that  for  k  >  11  this  ratio  is  larger  than 
1  and  smaller  than  2  with  the  mean  value  about  1.5.  Oscillations  between  1  and  2  can  be  explained  by 
optimization  of  the  octree  depth  which  was  independent  for  each  case  reported. 

6  Conclusion 

The  main  goal  of  this  study  was  to  develop  an  efficient  method  for  fast  summation  of  elementary  vor¬ 
tices.  Numerical  tests  confirm  the  validity  of  the  theory  presented  and  efficiency  of  the  method.  Our  re¬ 
sults  show  that  one  should  expect  approximately  2  times  increase  of  the  computation  time  for  velocity  and 
vorticity  stretching  term  computations  compared  to  a  single  harmonic  function  computations  for  the  same 
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Figure  2:  The  wall  clock  time  for  the  FMM  for  computation  of  vortical  and  potential  flows  (different  terms 
and  compinations).  The  straight  line  shows  linear  dependence.  For  all  cases  the  sources  are  distributed 
randomly  and  uniformly  inside  a  cube;  the  truncation  number  is  constant,  p  =  12. 

FMM  octree  and  the  truncation  number.  Compared  to  potential  and  gradient  computations  for  the  scalar 
Laplace  equation  this  increase  varies  in  the  range  from  1  to  2  times  (average  1.5)  depending  on  particular 
source/receiver  distribution,  truncation  number,  etc. 

An  interesting  observation  from  the  study  is  that  a  general  reconstruction  of  vector  fields  from  given 
curl  and  divergence  can  be  obtained  via  the  present  method,  which  operates  only  with  two  scalar  potentials. 
This  may  have  application  to  many  other  fields  of  physics,  including  plasma  physics,  electromagnetism,  etc. 
In  this  sense  the  DCVLE  appeal's  to  be  a  fundamental  equation,  the  solutions  of  which  can  be  accelerated 
via  the  harmonic  FMM.  As  is  shown,  modifications  of  standard  FMM  programs  are  relatively  easy,  and 
require  tracking  of  two  harmonic  functions,  instead  of  one,  and  implementation  of  the  conversion  operators 
used  in  each  translation.  Such  operators  are  very  sparse  and  simple  (especially  for  the  case  of  the  RCR- 
decomposition)  and  their  execution  does  not  create  substantial  overheads.  In  terms  of  further  acceleration 
of  computations  it  is  natural  to  consider  implementations  of  the  method  on  graphics  processors  (GPUs)  for 
which  the  vortex  methods  are  developed  and  tested  (e.g.  [20]). 
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